Implicit Compression Boosting with Applications to Self-indexing

نویسندگان

  • Veli Mäkinen
  • Gonzalo Navarro
چکیده

Compression boosting (Ferragina & Manzini, SODA 2004) is a new technique to enhance zeroth order entropy compressors’ performance to k-th order entropy. It works by constructing the BurrowsWheeler transform of the input text, finding optimal partitioning of the transform, and then compressing each piece using an arbitrary zeroth order compressor. The optimal partitioning has the property that the achieved compression is boosted to k-th order entropy, for any k. The technique has an application to text indexing: Essentially, building a wavelet tree (Grossi et al., SODA 2003) for each piece in the partitioning yields a k-th order compressed full-text self-index providing efficient substring searches on the indexed text (Ferragina et al., SPIRE 2004). In this paper, we show that using explicit compression boosting with wavelet trees is not necessary; our new analysis reveals that the size of the wavelet tree built for the complete Burrows-Wheeler transformed text is, in essence, the sum of those built for the pieces in the optimal partitioning. Hence, the technique provides a way to do compression boosting implicitly, with a trivial linear time algorithm, but fixed to a specific zeroth order compressor (Raman et al., SODA 2002). In addition to having these consequences on compression and static full-text self-indexes, the analysis shows that a recent dynamic zeroth order compressed self-index (Mäkinen & Navarro, CPM 2006) occupies in fact space proportional to k-th order entropy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fixed Block Compression Boosting in FM-Indexes

A compressed full-text self-index occupies space close to that of the compressed text and simultaneously allows fast pattern matching and random access to the underlying text. Among the best compressed self-indexes, in theory and in practice, are several members of the FM-index family. In this paper, we describe new FM-index variants that combine nice theoretical properties, simple implementati...

متن کامل

The Engineering of a Compression Boosting Library: Theory vs Practice in BWT Compression

Data Compression is one of the most challenging arenas both for algorithm design and engineering. This is particularly true for Burrows and Wheeler Compression a technique that is important in itself and for the design of compressed indexes. There has been considerable debate on how to design and engineer compression algorithms based on the BWT paradigm. In particular, Move-to-Front Encoding is...

متن کامل

Boosting Text Compression with Word-Based Statistical Encoding

Semistatic word-based byte-oriented compressors are known to be attractive alternatives to compress natural language texts. With compression ratios around 30-35%, they allow fast direct searching of compressed text. In this article we reveal that these compressors have even more benefits. We show that most of the state-of-the-art compressors benefit from compressing not the original text, but t...

متن کامل

Grid Integration of a Single-Source Switched-Capacitor Multilevel Inverter with Boosting Capability

This paper investigates the connection of a single-phase multilevel inverter structure to an existing power grid. The applied voltage source inverter is able to generate a near sinusoidal voltage waveform with an amplitude six times the input voltage by using a single DC input power supply. A proportional-resonant (PR) controller regulates the injected current into the grid. While the EPLL is u...

متن کامل

Recovering from negative events by boosting implicit positive affect.

Upregulation of implicit positive affect (PA) can act as a mechanism to deal with negative affect. Two studies tracked temporal changes in positive and negative affect (NA) assessed by self-report and the Implicit Positive and Negative Affect Test (IPANAT; Quirin, Kazen, & Kuhl, 2009). Study 1 observed the predicted increases in implicit PA after exposure to a threat-related film clip, which co...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007